Systematic and Fully Automated Identification of Protein Sequence Patterns

نویسندگان

  • Reece Hart
  • Ajay K. Royyuru
  • Gustavo Stolovitzky
  • Andrea Califano
چکیده

We present an efficient algorithm to systematically and automatically identify patterns in protein sequence families. The procedure is based on the Splash deterministic pattern discovery algorithm and on a framework to assess the statistical significance of patterns. We demonstrate its application to the fully automated discovery of patterns in 974 PROSITE families (the complete subset of PROSITE families which are defined by patterns and contain DR records). Splash generates patterns with better specificity and undiminished sensitivity, or vice versa, in 28% of the families; identical statistics were obtained in 48% of the families, worse statistics in 15%, and mixed behavior in the remaining 9%. In about 75% of the cases, Splash patterns identify sequence sites that overlap more than 50% with the corresponding PROSITE pattern. The procedure is sufficiently rapid to enable its use for daily curation of existing motif and profile databases. Third, our results show that the statistical significance of discovered patterns correlates well with their biological significance. The trypsin subfamily of serine proteases is used to illustrate this method's ability to exhaustively discover all motifs in a family that are statistically and biologically significant. Finally, we discuss applications of sequence patterns to multiple sequence alignment and the training of more sensitive score-based motif models, akin to the procedure used by PSI-BLAST. All results are available at httpl//www.research.ibm.com/spat/.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of selected monogeneans using image processing, artificial neural network and K-nearest neighbor

Abstract Over the last two decades, improvements in developing computational tools made significant contributions to the classification of biological specimens` images to their correspondence species. These days, identification of biological species is much easier for taxonomist and even non-taxonomists due to the development of automated computer techniques and systems.  In this study, we d...

متن کامل

Systematic and Fully Automated Identi cation of Protein Sequence Patterns

We present an efŽ cient algorithm to systematically and automatically identify patterns in protein sequence families. The procedure is based on the Splash deterministic pattern discovery algorithm and on a framework to assess the statistical signiŽ cance of patterns. We demonstrate its application to the fully automated discovery of patterns in 974 PROSITE families (the complete subset of PROSI...

متن کامل

A systematic review and Qualitative meta-analysis on the Identification patterns in Specific Learning Disorder

An accurate identification serves as the pathway that can guide the therapist towards the ultimate goal of adopting appropriate therapeutic and rehabilitation methods. Therefore, the present study aimed to systematically review and qualitative meta-analysis on the identification patterns in Specific Learning Disorder (SLD). The data in this qualitative meta-analysis was all study related to key...

متن کامل

Location proteomics: a systems approach to subcellular location.

Systems Biology requires comprehensive systematic data on all aspects and levels of biological organization and function. In addition to information on the sequence, structure, activities and binding interactions of all biological macromolecules, the creation of accurate predictive models of cell behaviour will require detailed information on the distribution of those molecules within cells and...

متن کامل

A Framework for Exploring the Frequent Patterns based on Activities Sequence

In recent years, the development of the use of location-based tools has made it possible to produce geometric trajectories from the user's movement paths. In this way, users' goal of traveling and related activities can be considered in addition to the geometry and route shape. the user activity trajectory represents the sequence of the visited activities and its related analysis as presented i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of computational biology : a journal of computational molecular cell biology

دوره 7 3-4  شماره 

صفحات  -

تاریخ انتشار 2000